AITopics | bias category

Collaborating Authors

bias category

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Ready to Translate, Not to Represent? Bias and Performance Gaps in Multilingual LLMs Across Language Families and Domains

Sayeedi, Md. Faiyaz Abdullah, Alam, Md. Mahbub, Rahman, Subhey Sadi, Islam, Md. Adnanul, Deepti, Jannatul Ferdous, Mohiuddin, Tasnim, Islam, Md Mofijul, Shatabda, Swakkhar

arXiv.org Artificial IntelligenceNov-3-2025

The rise of Large Language Models (LLMs) has redefined Machine Translation (MT), enabling context-aware and fluent translations across hundreds of languages and textual domains. Despite their remarkable capabilities, LLMs often exhibit uneven performance across language families and specialized domains. Moreover, recent evidence reveals that these models can encode and amplify different biases present in their training data, posing serious concerns for fairness, especially in low-resource languages. To address these gaps, we introduce Translation Tangles, a unified framework and dataset for evaluating the translation quality and fairness of open-source LLMs. Our approach benchmarks 24 bidirectional language pairs across multiple domains using different metrics. We further propose a hybrid bias detection pipeline that integrates rule-based heuristics, semantic similarity filtering, and LLM-based validation. We also introduce a high-quality, bias-annotated dataset based on human evaluations of 1,439 translation-reference pairs. The code and dataset are accessible on GitHub: https://github.com/faiyazabdullah/TranslationTangles

large language model, machine learning, translation, (16 more...)

arXiv.org Artificial Intelligence

2510.07877

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.67)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

Breaking the Benchmark: Revealing LLM Bias via Minimal Contextual Augmentation

Miandoab, Kaveh Eskandari, Kamruzzaman, Mahammed, Gharooni, Arshia, Kim, Gene Louis, Sarathy, Vasanth, Mehrabi, Ninareh

arXiv.org Artificial IntelligenceOct-29-2025

Large Language Models have been shown to demonstrate stereotypical biases in their representations and behavior due to the discriminative nature of the data that they have been trained on. Despite significant progress in the development of methods and models that refrain from using stereotypical information in their decision-making, recent work has shown that approaches used for bias alignment are brittle. In this work, we introduce a novel and general augmentation framework that involves three plug-and-play steps and is applicable to a number of fairness evaluation benchmarks. Through application of augmentation to a fairness evaluation dataset (Bias Benchmark for Question Answering (BBQ)), we find that Large Language Models (LLMs), including state-of-the-art open and closed weight models, are susceptible to perturbations to their inputs, showcasing a higher likelihood to behave stereotypically. Furthermore, we find that such models are more likely to have biased behavior in cases where the target demographic belongs to a community less studied by the literature, underlining the need to expand the fairness and safety research to include more diverse communities.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.23921

Country:

North America > United States (1.00)
Asia (0.93)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge

Cantini, Riccardo, Orsino, Alessio, Ruggiero, Massimo, Talia, Domenico

arXiv.org Artificial IntelligenceOct-17-2025

The growing integration of Large Language Models (LLMs) into critical societal domains has raised concerns about embedded biases that can perpetuate stereotypes and undermine fairness. Such biases may stem from historical inequalities in training data, linguistic imbalances, or adversarial manipulation. Despite mitigation efforts, recent studies show that LLMs remain vulnerable to adversarial attacks that elicit biased outputs. This work proposes a scalable benchmarking framework to assess LLM robustness to adversarial bias elicitation. Our methodology involves: (i) systematically probing models across multiple tasks targeting diverse sociocultural biases, (ii) quantifying robustness through safety scores using an LLM-as-a-Judge approach, and (iii) employing jailbreak techniques to reveal safety vulnerabilities. To facilitate systematic benchmarking, we release a curated dataset of bias-related prompts, named CLEAR-Bias. Our analysis, identifying DeepSeek V3 as the most reliable judge LLM, reveals that bias resilience is uneven, with age, disability, and intersectional biases among the most prominent. Some small models outperform larger ones in safety, suggesting that training and architecture may matter more than scale. However, no model is fully robust to adversarial elicitation, with jailbreak attacks using low-resource languages or refusal suppression proving effective across model families. We also find that successive LLM generations exhibit slight safety gains, while models fine-tuned for the medical domain tend to be less safe than their general-purpose counterparts.

category, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10994-025-06862-6

2504.07887

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Open-DeBias: Toward Mitigating Open-Set Bias in Language Models

Rani, Arti, Singh, Shweta, Sahoo, Nihar Ranjan, Nayak, Gaurav Kumar

arXiv.org Artificial IntelligenceSep-30-2025

Large Language Models (LLMs) have achieved remarkable success on question answering (QA) tasks, yet they often encode harmful biases that compromise fairness and trustworthiness. Most existing bias mitigation approaches are restricted to predefined categories, limiting their ability to address novel or context-specific emergent biases. To bridge this gap, we tackle the novel problem of open-set bias detection and mitigation in text-based QA. We introduce OpenBiasBench, a comprehensive benchmark designed to evaluate biases across a wide range of categories and subgroups, encompassing both known and previously unseen biases. Additionally, we propose Open-DeBias, a novel, data-efficient, and parameter-efficient debiasing method that leverages adapter modules to mitigate existing social and stereotypical biases while generalizing to unseen ones. Compared to the state-of-the-art BMBI method, Open-DeBias improves QA accuracy on BBQ dataset by nearly $48\%$ on ambiguous subsets and $6\%$ on disambiguated ones, using adapters fine-tuned on just a small fraction of the training data. Remarkably, the same adapters, in a zero-shot transfer to Korean BBQ, achieve $84\%$ accuracy, demonstrating robust language-agnostic generalization. Through extensive evaluation, we also validate the effectiveness of Open-DeBias across a broad range of NLP tasks, including StereoSet and CrowS-Pairs, highlighting its robustness, multilingual strength, and suitability for general-purpose, open-domain bias mitigation. The project page is available at: https://sites.google.com/view/open-debias25

category, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.23805

Country:

North America > Mexico (0.28)
North America > United States (0.28)
Europe > Switzerland (0.28)
Asia > Middle East (0.28)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Industry:

Education (0.68)
Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SESGO: Spanish Evaluation of Stereotypical Generative Outputs

Robles, Melissa, Bernal, Catalina, Raigoso, Denniss, Rubio, Mateo Dulce

arXiv.org Artificial IntelligenceSep-4-2025

This paper addresses the critical gap in evaluating bias in multilingual Large Language Models (LLMs), with a specific focus on Spanish language within culturally-aware Latin American contexts. Despite widespread global deployment, current evaluations remain predominantly US-English-centric, leaving potential harms in other linguistic and cultural contexts largely underexamined. We introduce a novel, culturally-grounded framework for detecting social biases in instruction-tuned LLMs. Our approach adapts the underspecified question methodology from the BBQ dataset by incorporating culturally-specific expressions and sayings that encode regional stereotypes across four social categories: gender, race, socioeconomic class, and national origin. Using more than 4,000 prompts, we propose a new metric that combines accuracy with the direction of error to effectively balance model performance and bias alignment in both ambiguous and disambiguated contexts. To our knowledge, our work presents the first systematic evaluation examining how leading commercial LLMs respond to culturally specific bias in the Spanish language, revealing varying patterns of bias manifestation across state-of-the-art models. We also contribute evidence that bias mitigation techniques optimized for English do not effectively transfer to Spanish tasks, and that bias patterns remain largely consistent across different sampling temperatures. Our modular framework offers a natural extension to new stereotypes, bias categories, or languages and cultural contexts, representing a significant step toward more equitable and culturally-aware evaluation of AI systems in the diverse linguistic environments where they operate.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.03329

Country:

South America (0.95)
North America > United States (0.68)

Genre: Research Report > New Finding (0.68)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government (0.68)
Government > Immigration & Customs (0.68)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Persuasiveness and Bias in LLM: Investigating the Impact of Persuasiveness and Reinforcement of Bias in Language Models

Roy, Saumya

arXiv.org Artificial IntelligenceAug-25-2025

Warning: This research studies AI persuasion and bias amplification that could be misused; all experiments are for safety evaluation. Large Language Models (LLMs) now generate convincing, human-like text and are widely used in content creation, decision support, and user interactions. Yet the same systems can spread information or misinformation at scale and reflect social biases that arise from data, architecture, or training choices. This work examines how persuasion and bias interact in LLMs, focusing on how imperfect or skewed outputs affect persuasive impact. Specifically, we test whether persona-based models can persuade with fact-based claims while also, unintentionally, promoting misinformation or biased narratives. We introduce a convincer-skeptic framework: LLMs adopt personas to simulate realistic attitudes. Skeptic models serve as human proxies; we compare their beliefs before and after exposure to arguments from convincer models. Persuasion is quantified with Jensen-Shannon divergence over belief distributions. We then ask how much persuaded entities go on to reinforce and amplify biased beliefs across race, gender, and religion. Strong persuaders are further probed for bias using sycophantic adversarial prompts and judged with additional models. Our findings show both promise and risk. LLMs can shape narratives, adapt tone, and mirror audience values across domains such as psychology, marketing, and legal assistance. But the same capacity can be weaponized to automate misinformation or craft messages that exploit cognitive biases, reinforcing stereotypes and widening inequities. The core danger lies in misuse more than in occasional model mistakes. By measuring persuasive power and bias reinforcement, we argue for guardrails and policies that penalize deceptive use and support alignment, value-sensitive design, and trustworthy deployment.

belief distribution, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.15798

Country:

Asia (1.00)
Europe (0.92)
North America > United States (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Government (1.00)
Banking & Finance (1.00)
Media > News (0.74)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models

Lan, Tian, Su, Xiangdong, Liu, Xu, Wang, Ruirui, Chang, Ke, Li, Jiang, Gao, Guanglai

arXiv.org Artificial IntelligenceAug-8-2025

As large language models (LLMs) are increasingly applied to various NLP tasks, their inherent biases are gradually disclosed. Therefore, measuring biases in LLMs is crucial to mitigate its ethical risks. However, most existing bias evaluation datasets focus on English and North American culture, and their bias categories are not fully applicable to other cultures. The datasets grounded in the Chinese language and culture are scarce. More importantly, these datasets usually only support single evaluation tasks and cannot evaluate the bias from multiple aspects in LLMs. To address these issues, we present a Multi-task Chinese Bias Evaluation Benchmark (McBE) that includes 4,077 bias evaluation instances, covering 12 single bias categories, 82 subcategories and introducing 5 evaluation tasks, providing extensive category coverage, content diversity, and measuring comprehensiveness. Additionally, we evaluate several popular LLMs from different series and with parameter sizes. In general, all these LLMs demonstrated varying degrees of bias. We conduct an in-depth analysis of results, offering novel insights into bias in LLMs.

category, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.findings-acl.313

2507.02088

Country: Asia > China (0.46)

Genre:

Questionnaire & Opinion Survey (0.92)
Research Report > New Finding (0.92)

Industry:

Law (0.68)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

BEATS: Bias Evaluation and Assessment Test Suite for Large Language Models

Abhishek, Alok, Erickson, Lisa, Bandopadhyay, Tushar

arXiv.org Artificial IntelligenceMar-31-2025

In this research, we introduce BEA TS, a novel framework for evaluating Bias, Ethics, Fairness, and Factuality in Large Language Models (LLMs). Building upon the BEA TS framework, we present a bias benchmark for LLMs that measure performance across 29 distinct metrics. These metrics span a broad range of characteristics, including demographic, cognitive, and social biases, as well as measures of ethical reasoning, group fairness, and factuality related misinformation risk. These metrics enable a quantitative assessment of the extent to which LLM generated responses may perpetuate societal prejudices that reinforce or expand systemic inequities. To achieve a high score on this benchmark a LLM must show very equitable behavior in their responses, making it a rigorous standard for responsible AI evaluation. Empirical results based on data from our experiment show that, 37.65% of outputs generated by industry leading models contained some form of bias, highlighting a substantial risk of using these models in critical decision making systems. BEA TS framework and benchmark offer a scalable and statistically rigorous methodology to benchmark LLMs, diagnose factors driving biases, and develop mitigation strategies. With the BEA TS framework, our goal is to help the development of more socially responsible and ethically aligned AI models.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.2431

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.67)

Industry:

Law (0.67)
Government > Regional Government (0.46)
Media > News (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

No LLM is Free From Bias: A Comprehensive Study of Bias Evaluation in Large Language models

Kumar, Charaka Vinayak, Urlana, Ashok, Kanumolu, Gopichand, Garlapati, Bala Mallikarjunarao, Mishra, Pruthwik

arXiv.org Artificial IntelligenceMar-14-2025

Advancements in Large Language Models (LLMs) have increased the performance of different natural language understanding as well as generation tasks. Although LLMs have breached the state-of-the-art performance in various tasks, they often reflect different forms of bias present in the training data. In the light of this perceived limitation, we provide a unified evaluation of benchmarks using a set of representative LLMs that cover different forms of biases starting from physical characteristics to socio-economic categories. Moreover, we propose five prompting approaches to carry out the bias detection task across different aspects of bias. Further, we formulate three research questions to gain valuable insight in detecting biases in LLMs using different approaches and evaluation metrics across benchmarks. The results indicate that each of the selected LLMs suffer from one or the other form of bias with the LLaMA3.1-8B model being the least biased. Finally, we conclude the paper with the identification of key challenges and possible future directions.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.11985

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Oceania > Australia (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(66 more...)

Genre: Research Report > New Finding (0.88)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

No Free Lunch: Retrieval-Augmented Generation Undermines Fairness in LLMs, Even for Vigilant Users

Hu, Mengxuan, Wu, Hongyi, Guan, Zihan, Zhu, Ronghang, Guo, Dongliang, Qi, Daiqing, Li, Sheng

arXiv.org Artificial IntelligenceOct-9-2024

Retrieval-Augmented Generation (RAG) is widely adopted for its effectiveness and cost-efficiency in mitigating hallucinations and enhancing the domain-specific generation capabilities of large language models (LLMs). However, is this effectiveness and cost-efficiency truly a free lunch? In this study, we comprehensively investigate the fairness costs associated with RAG by proposing a practical three-level threat model from the perspective of user awareness of fairness. Specifically, varying levels of user fairness awareness result in different degrees of fairness censorship on the external dataset. We examine the fairness implications of RAG using uncensored, partially censored, and fully censored datasets. Our experiments demonstrate that fairness alignment can be easily undermined through RAG without the need for fine-tuning or retraining. Even with fully censored and supposedly unbiased external datasets, RAG can lead to biased outputs. Our findings underscore the limitations of current alignment methods in the context of RAG-based LLMs and highlight the urgent need for new strategies to ensure fairness. We propose potential mitigations and call for further research to develop robust fairness safeguards in RAG-based LLMs.

arxiv preprint arxiv, category, dataset, (11 more...)

arXiv.org Artificial Intelligence

2410.07589

Country:

North America > United States > Virginia (0.05)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback